An Attemp to Predict the Prevalence of Wasting by WHZ grounded on insights from historical data

A case study using five years of Somalia nutrition surveys

17 January 2026

Analysis workflow




flowchart LR
  subgraph DW["Data Wrangling"]
    zscores[Compute Z-scores]
    zscores --> WHZ --> Define[/Define wasting/] --> Exclude([Exclude outliers in WHZ MFAZ])
    zscores --> MFAZ --> Define --> Exclude

  end

  subgraph PC["Plausibility Check"]
    Plausible[Plausibility Check]
    Plausible --> wfhz[WHZ] --> FLFLT([Flawless or Faulty])
    Plausible --> mfaz[MFAZ] --> FLFLT
  end

  subgraph SFET["Split FE and Testing sets"]
    FLW[Flawless] --> FLW_TS_RDM[/Time & Random based/]
    FLW_TS_RDM --> FLWFE[/F.Extraction: 80%/] --> FLW_TSFE_set([31,066])
    FLW_TS_RDM --> FLWT[/Testing: 20%/] --> FLW_TST_set([7,767])
    FLT[Faulty] --> FLT_TS_RDM[/Time & Random based/]
    FLT_TS_RDM --> FLTFE[/F.Extraction: 80%/] --> FLT_TSFE_set([74,075])
    FLT_TS_RDM --> FLTT[/Testing: 20%/] --> FLT_TST_set([18,519])
  end

  subgraph FEX["Feature Extraction"]
    burden[Total wasting]
    burden --> WHZMUAC[WHZ or MUAC]
    burden --> WHZMFAZ[WHZ or MFAZ]
    RAT[Prevalence Ratio]
    WHZMUAC --> prop1[/%WHZ/] --> Med([Median])
    WHZMUAC --> prop2[/%MUAC/] --> Med
    WHZMFAZ --> prop1
    WHZMFAZ --> prop3[/%MFAZ/] --> Med
    RAT --> ratio1[/WHZ:MUAC/] --> Med
    RAT --> ratio2[/WHZ:MFAZ/] --> Med
  end

  subgraph MOD["Model"]
    model[Predict and Test Accuract]
    model --> metric1[Mean Absolute Error]
    model --> metric2[Mean Percent Absolute Error]
  end

DW --> PC --> SFET --> FEX --> MOD

Prediction accuracy

Prediction Accuracy
Error = | Observed - Predicted |
Model MAE1 MAE of ratio2 MAPE3 MAPE of ratio
muac_wfhz_fl_time 7.97 7.93 60.88% 60.62%
muac_wfhz_fl_rdm 5.64 5.84 48.30% 49.98%
mfaz_wfhz_fl_time 6.95 6.86 51.10% 50.50%
mfaz_wfhz_fl_rdm 4.65 4.58 40.79% 39.40%
muac_wfhz_flt_time 7.80 7.74 65.34% 64.86%
muac_wfhz_flt_rdm 8.28 8.25 82.22% 81.95%
mfaz_wfhz_flt_time 5.50 5.47 43.36% 43.10%
mfaz_wfhz_flt_rdm 5.55 5.50 54.44% 54.00%
1 Mean Absolute Error of proportion-based model
2 Mean Absolute Error of prevalence-ratio-based model
3 Mean Absolute Percent Error of proportion-based model

Conclusion


  • MFAZ showed to be the best-performing indicator to predict the prevalence of wasting by WHZ with relatively smaller errors than raw MUAC values.

Caution

Nonetheless, the level of uncertainty remains high, with an average five percent under- or over-estimation.

  • Proportion- or prevalence-ratio-based prediction approach leads to same results. One could use either approach.